Search CORE

11 research outputs found

Using SentiWordNet and Sentiment Analysis for Detecting Radical Content on Web Forums

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: Chengdu China
Publication date: 01/01/2012
Field of study

Affect Analysis of Radical Contents on Web Forums Using SentiWordNet

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: 'IACSIT Press'
Publication date: 01/01/2013
Field of study

The internet has become a major tool for communication, training, fundraising, media operations, and recruitment, and these processes often use web forums. This paper presents a model that was built using SentiWordNet, WordNet and NLTK to analyze selected web forums that included radical content. SentiWordNet is a lexical resource for supporting opinion mining by assigning a positivity score and a negativity score to each WordNet. The approaches of the model measure and identify sentiment polarity and affect the intensity of that which appears in the web forum. The results show that SentiWordNet can be used for analyzing sentences that appear in web forums

Northumbria Research Link

Sentiment Analysis Of Web Forums: Comparison Between SentiWordNet And SentiStrength

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: The 4th International Conference on Computer Technology and Development (ICCTD 2012). 24 - 25 November 2012
Publication date: 01/01/2012
Field of study

Internet has become a major tool for communication, training, fundraising, media operations, and recruitment, and these processes often use web forums. This paper intended to find suitable technique for analysing selected web forums that included radical content by presenting a comparison between SentiWordNet and SentiStrength. SentiWordNet is a lexical resource for supporting opinion mining by assigning a positivity score and a negativity score to each WordNet. SentiStrength is a technique that was developed from comments on MySpace. It uses human-designed lexical and emotional terms with a set of amplification, diminishing and negation rules. The results have been presented and discussed

Northumbria Research Link

Sentiment Analysis: State of the Art

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: Institute of Research Engineers and Doctors
Publication date: 01/08/2013
Field of study

We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included

Northumbria Research Link

TJP: using Twitter to analyze the polarity of contexts

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2013
Field of study

This paper presents our system, TJP, whic

CiteSeerX

Northumbria Research Link

Quantitative Assessment of Factors in Sentiment Analysis

Author: Chalothorn Tawunrat
Publication venue
Publication date
Field of study

Sentiment can be defined as a tendency to experience certain emotions in relation to a particular object or person. Sentiment may be expressed in writing, in which case determining that sentiment algorithmically is known as sentiment analysis. Sentiment analysis is often applied to Internet texts such as product reviews, websites, blogs, or tweets, where automatically determining published feeling towards a product, or service is very useful to marketers or opinion analysts. The main goal of sentiment analysis is to identify the polarity of natural language text. This thesis sets out to examine quantitatively the factors that have an effect on sentiment analysis. The factors that are commonly used in sentiment analysis are text features, sentiment lexica or resources, and the machine learning algorithms employed. The main aim of this thesis is to investigate systematically the interaction between sentiment analysis factors and machine learning algorithms in order to improve sentiment analysis performance as compared to the opinions of human assessors. A software system known as TJP was designed and developed to support this investigation. The research reported here has three main parts. Firstly, the role of data pre-processing was investigated with TJP using a combination of features together with publically available datasets. This considers the relationship and relative importance of superficial text features such as emoticons, n-grams, negations, hashtags, repeated letters, special characters, slang, and stopwords. The resulting statistical analysis suggests that a combination of all of these features achieves better accuracy with the dataset, and had a considerable effect on system performance. Secondly, the effect of human marked up training data was considered, since this is required by supervised machine learning algorithms. The results gained from TJP suggest that training data greatly augments sentiment analysis performance. However, the combination of training data and sentiment lexica seems to provide optimal performance. Nevertheless, one particular sentiment lexicon, AFINN, contributed better than others in the absence of training data, and therefore would be appropriate for unsupervised approaches to sentiment analysis. Finally, the performance of two sophisticated ensemble machine learning algorithms was investigated. Both the Arbiter Tree and Combiner Tree were chosen since neither of them has previously been used with sentiment analysis. The objective here was to demonstrate their applicability and effectiveness compared to that of the leading single machine learning algorithms, Naïve Bayes, and Support Vector Machines. The results showed that whilst either can be applied to sentiment analysis, the Arbiter Tree ensemble algorithm achieved better accuracy performance than either the Combiner Tree or any single machine learning algorithm

Northumbria Research Link

Parsing Thai Social Data: A New Challenge for Thai NLP

Author: Chalothorn Tawunrat
Khampingyot Borirat
Maharattamalai Nattasit
Singkul Sattaya
Taerungruang Supawat
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/03/2020
Field of study

Dependency parsing (DP) is a task that analyzes text for syntactic structure and relationship between words. DP is widely used to improve natural language processing (NLP) applications in many languages such as English. Previous works on DP are generally applicable to formally written languages. However, they do not apply to informal languages such as the ones used in social networks. Therefore, DP has to be researched and explored with such social network data. In this paper, we explore and identify a DP model that is suitable for Thai social network data. After that, we will identify the appropriate linguistic unit as an input. The result showed that, the transition based model called, improve Elkared dependency parser outperform the others at UAS of 81.42%.Comment: 7 Pages, 8 figures, to be published in The 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2019

arXiv.org e-Print Archive

Crossref

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Author: Chalothorn Tawunrat
Chuangsuwanich Ekapol
Saetia Chanatip
Vateekul Peerapon
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 30/06/2021
Field of study

A sentence is typically treated as the minimal syntactic unit used to extract valuable information from long text. However, in written Thai, there are no explicit sentence markers. Some prior works use machine learning; however, a deep learning approach has never been employed. We propose a deep learning model for sentence segmentation that includes three main contributions. First, we integrate n-gram embedding as a local representation to capture word groups near sentence boundaries. Second, to focus on the keywords of dependent clauses, we combine the model with a distant representation obtained from self-attention modules. Finally, due to the scarcity of labeled data, for which annotation is difficult and time-consuming, we also investigate two techniques that allow us to utilize unlabeled data: Cross-View Training (CVT) as a semi-supervised learning technique, and a pre-trained language model (ELMo) to improve word representation. In the experiments, our model reduced the relative error by 7.4% and 18.5% compared with the baseline models on the Orchid and UGWC datasets, respectively. Ablation studies revealed that the main contributing factor was adopting n-gram features, which were further analyzed using the interpretation technique and indicated that the model utilizes the features in the same way that humans do

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

Using Arbiter and Combiner Tree to Classify Contexts of Data

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: 'IACSIT Press'
Publication date: 01/10/2016
Field of study

This paper reports on the use of ensemble learning to classify as either positive or negative the sentiment of Tweets. Tweets were chosen as Twitter is a popular tool and a public, human annotated dataset was made available as part of the SemEval 2013 competition. We report on a classification approach that contrasts single machine learning algorithms with a combination of algorithms in an ensemble learning approach. The single machine learning algorithms used were support vector machine (SVM) and Naïve Bayes (NB), while the methods of ensemble learning include the arbiter tree and the combiner tree. Our system achieved an F-score using Tweets and SMS with the arbiter tree at 83.57% and 93.55%, respectively, which was better than base classifiers; meanwhile, the results from the combiner tree achieved lower scores than base classifiers

Northumbria Research Link